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(54) Fractional pixel motion estimation of video signals 



(57) Fractional pixel nnotion estimation of video sig- 
nals is performed by comparing a block of pixels from a 
current image of the video signal with a plurality of dis- 
placed blocks within a search window from a previous/ 
future reference image of the video signal via a distortion 
function. The displaced block from the reference image 
that produces the minimum value for the distortion func- 



tion provides a center pixel. A general surface is fitted 
(1 2) around this center pixel so that it equals the distor- 
tion function at each integer pixel location surrounding 
the center pixel. The distortion function for the fitted sur- 
face is estimated (14) for fractional pixel locations, and 
the motion vector corresponding to the minimum value 
for the fractional distortion function is selected (16) for 
transmission as part of the compressed video. 
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Description 

Background of the Invention 

s The present invention relates to compression of motion video signals, and more particularly to fractional pixel 

motion estimation of video signals for simplifying the determination of a best motion vector for each pixel of the video 
signals during video compression. 

Video and many medical images are received as sequences of two-dimensional image frames or fields. To transmit 
such images as digital signals some form of compression is required. Three basic types of redundancy are exploited 

10 in a video compression process: temporal redundancy spatial redundancy and amplitude redundancy. Interframe cod- 
ing techniques make use of the redundancy between successive frames (temporal redundancy). In these techniques 
the information defining elements of a picture, i.e., pixels, are estimated by interpolation or prediction using information 
from related locations in preceding and/or succeeding versions of the picture, as exemplified in U.S. Patent No. 
4,383,272 issued May 10, 1983to Netravali etal entitled "Video Signal Interpolation Using Motion Estimation." Atypical 

15 compression encoder is shown in Fig. 1 where a video signal is input to a preprocessor and then into a motion estimator 
The motion estimator delays the video signal, to compensate for the processing delays for the motion vector generation 
process, before providing the video signal to an encoder loop where compression is performed. The compression is 
performed using a motion vector generated by the motion estimator, which is multiplexed with the compressed video 
signal at the output of the encoder for transmission. 

20 The interpolation between frames in the encoder is performed by first estimating the motion trajectory, i.e., motion 

vector or displacement vector, of each pixel. If an estimate of such displacement is available, then more efficient pre- 
diction may be performed in the encoder by relating to elements in a previous frame that are appropriately spatially 
displaced. These displacement vectors are used to project each pixel along its trajectory, resulting in the motion com- 
pensated prediction or interpolation. Once the motion vectors are determined, then the differences between consecutive 

25 motion compensated frames that exceed a predetermined threshold are determined by the encoder loop as the com- 
pressed video signal. 

Most motion estimation in interframe coding assumes (i) objects move in translation, i.e., zoom and rotation are 
not considered, (ii) illumination is spatially and temporally uniform, and (iii) occlusion of one object by another and 
uncovered background are not considered. In practice motion vectors are estimated for blocks of pixels so that the 

30 displacements are piecewise constant. Block matching is used to estimate the motion vector associated with each 
block of pixels in a current coding frame or field, assuming that the object displacement is constant within a small two- 
dimensional block of pixels. In these methods the motion vector for each block in the current frame or field is estimated 
by searching through a larger search window in a previous frame/field and/or succeeding frame/field for a best match 
using correlation or matching techniques. The motion estimator compares a block of pixels in the current frame with a 

35 block in the previous or future frame by computing a distortion function, such as shown in Fig. 2. Each block in the 
current frame is compared to displaced blocks at different locations in the previous or future frame within a search 
window, and the displacement vector that gives the minimum value of the distortion function is selected as being the 
best representation of the motion for that block. 

Using the notation (row,column) to present a position in a picture, for a block of MxN pixels at (m,n)X}r\e distortion 

40 function D^^^ n)(i,j) for a displacement of (ij) may be given as 

D(mai,(i.))*Z*'k=iIui*'^»<ni + k. n + 1) - u(m + k - + 1 - j)) 

45 

where u{,) is the previous or future image, v{,) is the current image, and f(x) is a given positive and increasing function 
of X. In general the candidate displacement vector (iJ) is restricted to a preselected [-p-|,p2]x[- (7-1,^2] region, or search 
window. Some useful choices for f(x) are Ixl and x^. Minimizing D^^ for various (ij)s for a given (m,n) gives the 
displacement vector for the block at (m,n). 

50 if /and yare both integers, minimization of the distortion function gives the motion vectors to an integer accuracy, 

or a full pixel. Fractional pixel accuracy motion vectors usually give better motion compensated prediction than the full 
pixel motion vectors. Fractional pixel accuracy motion vectors may be obtained by computing u(,) at fractional pixel 
grid locations through spatial interpolation. However obtaining the fractional pixel accuracy motion vectors is compu- 
tationally very expensive. Netravali et al, as described in the article entitled "A Codec for HDTV", IEEE Trans. Consumer 

55 Electronics, vol. 38, pp. 325-340, Aug. 1 992, use a simple scheme to approximate the half pixel motion vectors inde- 
pendently horizontally and vertically using the distortion function computed at integer pixel locations. Let D^^ ^^(ij) be 
minimum for (ij) = {iQjQ)(\n\eger pixel accuracy). A parabola is fit to the three points around the minimum, and the 
resulting equation is solved to find the position of the minimum of the curve. The process of computing the fractional 
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pixel accuracy motion vector (I'oJ'o) ^ block at (m,n) simplifies to solving 




L; otherwise 

10 

and 

. 1/2; (3D(„^,(7;./, + 1) - 2D^JjJo) ' ^^..)(^•/o ' 0 
= + 1/2; (3D^^,,(/,,/o • 1) • 2D,^JJoJoi ' ^i^M^o + «)< 0 
y^; otherwise 

This fractional pixel motion estimation is performed by a motion vector refinement generator as shown in Fig. 2. Fig. 
3 shows a block of pixels in a frame about the minimum integer pixel, and the fractional pixel locations that surround 
25 that pixel for which the distortion function is determined by interpolation from the distortion function for the surrounding 
pixels. 

What is desired is an improved fractional pixel motion estimation of video signals that provides greater accurary 
Summary of the Invention 

30 

Accordingly the present invention provides fractional pixel motion estimation of video signals using a general sur- 
face fit to a distortion function. A general surface is fit to a distortion function at a plurality of pixels in two dimensions 
about a pixel (ioJo) having a minimum integer distortion function. The distortion function is estimated or interpolated at 
fractional pixel locations surrounding the minimum distortion pixel as a function of the constants that define the general 

3S surface and the values of the distortion function for the surrounding pixels. From the fractional pixel distortion function 
values and the minimum integer pixel distortion function value the minimum value is selected, and the corresponding 
fractional pixel location defines the motion vector for predicting the location of the pixels of the block in the previous/ 
future frame/field as part of the video compression process. 

The objects, advantages and novel features of the present invention are apparent from the following detailed 

40 description when read in conjunction with the appended claims and attached drawing.. 

Brief Description of the Drawing 

Fig. 1 is a block diagrammatic view of a video compression encoder using motion estimation according to the prior 
45 art which may use the fractional pixel motion estimation according to the present invention. 

Fig. 2 is a block diagrammatic view of a motion estimator according to the prior art which is suitable for performing 

fractional pixel motion estimation according to the present invention. 

Fig. 3 is a plan view of a portion of a frame representing a block of pixels that are processed according to the 
present invention. 

50 Fig. 4 is a block diagrammatic view of a motion vector refinement generator that performs the general fractional 

pixel motion estimation according to the present invention. 

Description of the Preferred Embodiment 

55 As opposed to a separate parabola for each of two dimensions, as discussed above, a general surface is fitted to 

a distortion function D^^ compute an approximate fractional pixel motion vector. If has a minimum for (i, 

j) = (ioJo), then a fitted surface Ds^^^ f^){ij) to around that point is given by 
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^Vn)(''-/') " %i^-'ofU-jof + Ci {i-iofu -Jo) + C2(i-io)U'jof + c^(i - i^f + 

04U-Jof + CgC/ - /o)(y -Jo) + CqU - Iq) + CjU-Jq) + Cq 

where Cq through Cq are constants defining the surface DSf^^ j^^(ij) and DS(ni^n)('J) = '^(m,n)('J) integer pixel locations 
about (/qJo) 

Then Cq through Cq are found by 

Co = ■'/^[□(^ „j(/o+i,yo+i) + D^^„)(/o-i,yo+i) + „)(/o+i,yo-i) + d^^ „J(/o-1,yo-1)- 
2[Vn)('o'^-1) + Vn)('o'/o+1) + ,j(/o-1,yo) + D(^,)(/o+1,yo)] + 

4D(^,n)('0'^)] 

Ci = 1/4[-2D(^„)(/o,yo+1) +2D(^ ^^(/oJo-l) + 0^^^^(i^+^J^+^) + D^^ ^jC/^-l ,yo+1)- 
D(m,n)('o+1'yo-1) " D(^,n)(Vl'yo-1)] 

Cg = 1/4[2D(^ ^^(/o-ljo) - 2D(^ „j(/o+1,yo) + D^^ ^^(/o+l Jq+I) + D^^ ^^(/o+ljo-l) - 

D(rn,n)(Vl'V0+1)- Vn)(V1'/0-1)] 

^3 = 1/2[D(^ ^j(/o-1,yo) + D(^ ^)(/o+1 Jo) -2D(^ „)(/oJo)] 
^4 = 1/2[D(^,n)('0'^-1) + D(^ „)(/o,yo+1) -2D(^ „)(/oJo)] 

^5 = 1^4[D(^,n)('0+1'^+1) + Df^.rDC'o-l'/o-"') " D(^,n)('o-1 '/o+1 ) " D(m,n)('o+1 ' )] 
^6 = 1/2[D(^,n)('0+■''^) - D(^,n)('0-1'>0)] 
C7 = 1/2[D(^,n)(Wo+1)-D(rn,n)('0'Vl)] 
^8 = ,)(/oJo) 



The above equations define the fitted surface Ds^^^y One way to find the fractional pixel accurate motion vector 
50 is to find the gradient of the surface representation of Ds and find the position that gives minimum D. This fractional 
pixel accurate motion vector may then be quantized to a required precision. 

If the accuracy needed for the motion vector is known, another method may be used to estimate the fractional 
pixel accurate motion vector. Using the equations describing Ds, one can estimate the distortion function value at 
fractional pixel locations on a grid with desired accuracy spacing. For example if motion vectors are needed with half- 
55 pixel accuracy, the following procedure may be used. Ds computed at half-pixel positions around (/qJo) simplifies to: 

^Vn)(VO-5.yo) =0.375D(^„)(/o-1jo)-0.125D(^,j(/o+1Jo)+0.75D(^^j(/o^ 
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OS(„,n)('o+0-5.io) =0-375D(„,„)(/o+1 Jo) " 0.125D(„ Jq) + 0.75D^^ „^UoJo) 

s DS(m,n)('o.^-0-5) -0.375D(„ „)(/o,yo-1) - 0.125D(„ ^^(/oJo+l) + 0.75D(„ ^^C/qJo) 

OV,n)('0'>0+0-5) =0-375D(„ „)(/o,yo+1) - 0.125D(„ + 0.75D^^ „^{i^Jo) 

'° Os,„ „)(/o-0,5,yo+0.5) =0.28125D(„ „)(/o,yo+1) +0.28125D,, „)(/o-1,yo) + 

0.140625D(^ „)(/o-1,yo+1) -0.046875D(„ „)(/o-1,yo-1) - 
,S 0.09375D(^ - 0.046875D(^ „)(/o+1 .y^+l ) + 

0.015625D(„„)(/o+1,7o-1) - 0.09Z75D^^„^(i^,j^-\) + 
0-5625D(„„)(/o.yo) 

20 

OV,n)('o+0-5.yo+0-5) =0.28125D(„ „)(/o.yo+1) - 0.09375D(„ „)(/o-1.yo) - 
0.046875D,^^)(/o-1,yo+1) + 0.015625D,^„)(/o-1,yo-1) + 
0.281 25D,„ „j(/o+1 Jo) + 0.140625D(^ „j(/o+1,yo+1) - 
0.046875D(„ „)(/o+1,yo-1) - 0.09375D(„ „)(/o,yo-1) + 
so 0-5625D(„„)(/o,yo) 

OS(„,„)(/o-0-5.yo-0-5) -0.09375D(„ „)(/o,yo+1) +0.281 25D(„ „,(/o-1 Jo) - 
35 0.046875D(„„)(/o-1 Jo+1) +0.140625D(„„)(/o-1Jo-1) - 

0.09375D(„ „)(/o+1 Jo) + 0.015625D(^ ^^(/o+l Jo+I ) " 
0.046875D(„ „)(/o+1 + 0.281 25D(„ „j(/o,;o-1) + 
0.5625D(„„)(/oJo) 

OS(,„,n)('o+0-5Jo-0-5) =-0 09375D(„ „)(/oJo+1) -0.09375D(„ „j(/o-1,yo) + 
0.015625D(^„)(V1 Jo+1) - 0.046875D,„„j(;o-1 Jo-1) + 
0.28125D(„ „)(/o+1 Jo) - 0.046875D(„ „)(/o+1 Jo+D + 
50 0.140625D(„„)(/o+1 Jo-1) + 0.281 25D(„„)(/o,yo-1) + 

0-5625D(„„)(/oJo) 

Once the values of Ds at half-pixel positions are found from the above equations, the best half-pixel accurate motion 
55 vector may be selected. 

As shown in Fig. 4 the distortion function for the minimum distortion pixel (/qJo) the surrounding pixels from 

an integer-pixel accuracy distortion generator 10 is input to a surface fit generator 12 to produce the general surface 
fit distortion function Ds^^^^(i,j) The surface fit distortion function is then input to a fractional-pixel accuracy distortion 
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function generator 14 to produce the distortion function for the fractional pixel locations about the minimum integer 
distortion pixel. The fractional pixel displacement vectors and the minimum integer distortion pixel (/qJo) input to 
an output motion vector selector 16 to produce the refined motion vector (/qJo) ^^e compression encoder 

and for input to the compressed video signal for transmission. 

Thus the present invention provides a fractional pixel motion estimation for video signals by fitting a general surface 
to a block of integer pixels about one having a minimum distortion function, and estimating the distortion function at 
fractional pixel locations to select the displacement vector with fractional pixel accuracy having the minimum distortion 
function as the motion vector for the block of the current image. 



Claims 

1. An improved method of fractional pixel motion estimation for a video compression signal encoder wherein a block 
of pixels in a current video image of an input video signal is compared with displaced blocks of pixels at different 
locations in a reference video image to provide a compressed video signal by generating a distortion function for 
each location and selecting a displacement vector that gives a minimum value of the distortion function as a motion 
vector for the block of pixels, the motion vector being used by the video compression signal encoder to predict the 
reference video image, the improvement comprising the steps of: 

fitting (12) a general surface to the distortion function so that the fitted surface equals the distortion function 
at integer pixel locations surrounding a center pixel of the block having the minimum value of the distortion 

function; 

estimating (14) the distortion function at fractional pixel locations surrounding the center pixel; and 
selecting (1 6), as the motion vector, the displacement vector that gives a minimum value for the fractional pixel 
distortion function from among the center pixel and the fractional pixel locations surrounding the center pixel, 
the selected motion vector being transmitted as part of the compressed video for the block of pixels. 
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